Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix Future L1 Track SW + bugs in HLS memory classes. #347

Merged
merged 19 commits into from
Dec 4, 2024

Conversation

tomalin
Copy link
Contributor

@tomalin tomalin commented Sep 13, 2024

The Future L1Trk CMSSW SW has not yet migrated to the 2FPGA solution. I'm therefore trying to bodge it so it continues to work (for now) with the 1FPGA solution.

  1. The three functions in TrackBuilder_parameters.h with names getMPAR*() caused the CMSSW compiler to fail because of "duplicate function definitions". I fixed this by declaring them to be "constexpr" or "inline" in generate_TB.py. I also (on style grounds) changed their template parameter from type int to TF::seed.
  2. MemoryTemplateBinnedCM::clear() used constant NCOPY, which caused array-out-of-bounds errors in CMSSW, where NCOPY can be zero. Replaced NCOPY with NCP.
  3. CMSSW compiler errors about "duplicate code", solved by deleting the calculation of nentries withing CMSSW_GIT_HASH that was in MemoryTemplateBinnedCM & MemoryTemplateBinned & MemoryTemplatePROJ.
  4. Added constants DEPTH_BX and DEPTH_ADDR to MemoryTemplate.h, MemoryTemplateBinnedCM.h and MemoryTemplateTPROJ.h, to simplify them. In MemoryTemplateTPROJ.h, replaced magic number 128 by DEPTH_ADDR.
  5. Modified MemoryTemplateTPROJ to it instantiates only one page of memory when used inside CMSSW.
  6. Replaced argument addr_index of MemoryTemplate::write_mem_new() by argument "ap_uint<1> overwrite", with the same number of bits used by MatchProcessor.h HERE when it calls write_mem_new().
    Howver, this code inside write_mem_new(), which should protect the memory against having too many entries does nothing, since addr_index is a 1 bit number. This is a BUG. I replaced it by a check that nentries_[ibx] is than than (1<<NBIT_ADDR). However, array access can be slow in FW, so one should check if this causes FW timing errors.
  7. Besides the style issue that it's hard-wiring the number of pages to 4, https://github.com/cms-L1TK/firmware-hls/blob/master/TestBenches/FileReadUtility.h#L259 seems to incorrectly assume that MemoryTemplateTPROJ::getDepth() returns the total depth of the memory, whereas it can be seen from https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MemoryTemplateTPROJ.h#L49 that it returns a number that is a factor NPAGE less than this. I believe this is a BUG and that compareMemWithMemPage() is only checking one quarter of the memory. Fixed.
  8. SWLUTReader.h was writing by 1 unit beyond the array bounds of the input "lut" array, because it neglected that the input .tab files contain a final line with a "close bracket" but no number. This BUG had horrible consequences, causing all LUT contents to become zero in CMSSW. Now fixed.
  9. Modified download.sh, so that when running the CombinedCM chain, the TP takes its LUT tables from LUTsSplit/ instead of LUTsCM/ . This reduces the rate of memory errors, as LUTsCM/ is out-of-date. All other modules in the chain already used LUTsSplit/.
  10. To improve clarity of VMRouterCM.h & MatchProcessor.h, renamed nAllInnerCopies to nAllInnerVariants, since the versions of the AllStubsInner memories that are produced are not copies of each other, but are different from each other. Similarly, renamed maxFullMatchCopies to maxFullMatchVariants.

THIS PR NEEDS DISCUSSION WITH @aehart and @aryd BEFORE MERGING.

@tomalin
Copy link
Contributor Author

tomalin commented Sep 13, 2024

Strange. This fails git CI because of 2FA authentification issues https://github.com/cms-L1TK/firmware-hls/actions/runs/10853460640/job/30121761689?pr=347 . @aehart the L1Trk CMSSW code in our github cms-L1TK area runs CI in gitlab, and gitlab has a token (which I created) https://gitlab.cern.ch/cms-l1tk/cmssw_CI/-/settings/access_tokens so that it can access our github repo. Our github repo stores this at https://github.com/cms-L1TK/cmssw/settings/secrets/actions . The firmware-hls cms-L1TK github repo has no such tokens https://gitlab.cern.ch/cms-l1tk/firmware-hls/-/settings/access_tokens . But I'd have predicted that this would could 2FA failures for the HLS since Oct. 2023, not just now.

@tomalin
Copy link
Contributor Author

tomalin commented Sep 17, 2024

QUESTIONS/ISSUES:

  1. With this PR, the L1Trk Future SW runs in CMSSW14, but reports huge numbers of memory content inconsistencies in the TP, MP & TB. Although we use the CombinedCM wiring map, we use the memory content .txt files & LUT tables from the FPGASplit/ solution, as they are more up-to-date, and reduce the error count. The errors seen in the TP are due to expected the Tracklets all being found, but stored in a different order in the TPAR memories to in the reference files.
  2. Anders's branch of the old CMSSW emulation no longer writes the 1-FPGA solution memories to the .txt files. This will need fixing if the Future SW is not to see more memory errors.
    • ANDERS suggests to remake the txt files, after modifying TrackletProjectionMemories::writeTPROJ() in the old emulation code, such that e.g. MPROJ_L1L2ABC_D1PHIA pages 1-3 are is split into separate files for page (1,2,3) TPROJ_L1L2(A,B,C)_D1PHIA etc.
  3. The TrackletProcessor writes TrackletParameterMemory objects that inherit from MemoryTemplateTPROJ, and so have 4 pages of memory, despite the fact that the TrackletProcessor only writes to the first page. Is this wasting memory?
    • ANDERS SAYS: They use 1 BRAM, so waste of memory doesn't matter.
  4. Why is MemoryTemplate::write_mem_new() only called for the FullMatchMemory (from MatchProcessor.h), when several other memory types are based on MemoryTemplate. Is this function is an improvement on write_mem(), why not use it for all memory types?
    • ANDERS THINKS write_mem_new() should probably be used everywhere. But needs checking.
  5. MemoryTemplateBinnedCM::write_mem(ibx, slot, data, nentry_ibx) has two checks HERE and HERE that the number of entries is not too big to overflow. We don't need to cut twice, so this is a BUG. The first one cuts on nentry_ibx, which is a historic variable, no longer incremented inside the memory when new entries are added. The second cuts on nentry, which is the new variable, but is only applied when not in synthesis, whereas the first cut is also applied when in synthesis. I'M UNSURE HOW TO FIX THIS.
  6. MemoryTemplate::write_mem() increments the number of entries twice, once HERE and once HERE. Is this a BUG?
  7. Several data members (e.g. ireg__ and ireg___ and innerstub__ and innerstub___ ) are not initialized inside the constructor of TrackletEngineUnit and perhaps they should be set also in TrackletEngineUnit::reset()? I suspect they are first used by TrackletProcessor before they are initialized, which leads to unpredictable results in CMSSW. (If I initialize all these to zero in TrackletEngineUnit, and modify the default constructors of AllStubInnerMemory and VMStubTEOuterMemoryCM, so that they set data_ = 0), I see small changes in Future SW tracking performance in CMSSW 13). Is this a BUG?
  8. All HLS modules should use SWLUTReader() to access their LUT tables to reduce compilation memory use, but only the TrackletProcessor does. There is also no consistency in which directories the LUT tables are read from. The IR & TP read them from LutsSplit/ whereas the other modules read them from the symbolic links such as VMRCM/ . These are equivalent, but we should be consistent.
  9. I can think of no good reason for defining 5 identical typedef's https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/TrackletParameterMemory.h#L161 . Is this a mistake?

@tomalin tomalin requested review from aryd and aehart September 18, 2024 08:22
@tomalin tomalin changed the title Attempt to fix Future L1 Track SW (DO NOT MERGE YET) Attempt to fix Future L1 Track SW Sep 23, 2024
@tomalin tomalin marked this pull request as ready for review September 24, 2024 08:50
@tomalin tomalin changed the title Attempt to fix Future L1 Track SW Attempt to fix Future L1 Track SW + bugs in HLS memory classes. Sep 24, 2024
@tomalin
Copy link
Contributor Author

tomalin commented Nov 18, 2024

Rebased to latest master

Comment on lines 153 to 166
nentries_[ibx*kNBinsRZ+ibin].range(ireg*4+3,ireg*4)=nentry+1;
if (ibin!=0) {
nentries_[ibx*kNBinsRZ+ibin-1].range((ireg+8)*4+3,(ireg+8)*4)=nentry+1;
}
binmask8_[ibx][ibin].set_bit(ireg,true);
nentries_[ibx*kNBinsRZ+ibin].range(ireg*4+3,ireg*4)=nentry+1;
if (ibin!=0) {
nentries_[ibx*kNBinsRZ+ibin-1].range((ireg+8)*4+3,(ireg+8)*4)=nentry+1;
}
binmask8_[ibx][ibin].set_bit(ireg,true);

//icopy comparison must be signed int or future SW fails
writememloop:for (signed int icopy=0;icopy< (signed) NCP;icopy++) {
//icopy comparison must be signed int or future SW fails
writememloop:for (signed int icopy=0;icopy< (signed) NCP;icopy++) {
#pragma HLS unroll
dataarray_[icopy][ibx][getNEntryPerBin()*slot+nentry] = data;
}
dataarray_[icopy][ibx][getNEntryPerBin()*slot+nentry] = data;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomalin These changes just seem to affect the indentation, which was already a bit off, and now looks even more off, at least in my text editor. Could you clean this up, perhaps removing tabs altogether and consistently using two spaces to indent lines inside brackets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the spaces in the HLS code are often created with "tab", and the number of spaces that "tab" corresponds to is not well-defined. It differs on different computers and in different editors/viewing tools. To fix this, we'd really have to globally replace all tabs by spaces, and then run clang to enforce a sensible indentation. And ask people to stop using tabs when editing the HLS code. But this should wait until there are no outstanding PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, but I also think we can clean things up as we change other things. Especially with this block of code in particular, where the only changes that have been made are to the indentation, unless I've missed something.

Copy link
Contributor Author

@tomalin tomalin Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this, but it would mean replacing all tabs with spaces in every line of every file where you've queried my changes to the indentation. And this of course would make it harder during your PR review to spot what I've changed. I suggest I make this change at the end, after I've answered all your non-indentation related comments?
P.S. There are other changes in this file, such as the introduction of "static constexpr unsigned int DEPTH_BX = 1<<NBIT_BX;".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, these indentation comments are the only ones left. The block highlighted here is the most significant example, since the only changes you made, AFAICT, are to the indentation. Some of those changes also don't make sense, like lines 156 and 160 being in different columns now.

I included suggestions for the other blocks I highlighted that you can simply accept if you like, so I hope these comments are fairly trivial to address.

Comment on lines 193 to 196
for (size_t icopy=0; icopy < NCP; icopy++) {
for (size_t ibin=0; ibin < kNMemDepth; ibin++) {
dataarray_[icopy][ibx][ibin] = data;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomalin Same comment regarding indentation.

Suggested change
for (size_t icopy=0; icopy < NCP; icopy++) {
for (size_t ibin=0; ibin < kNMemDepth; ibin++) {
dataarray_[icopy][ibx][ibin] = data;
}
for (size_t icopy=0; icopy < NCP; icopy++) {
for (size_t ibin=0; ibin < kNMemDepth; ibin++) {
dataarray_[icopy][ibx][ibin] = data;
}

Comment on lines 92 to 95
//assert(page < NPAGE);
// TODO: check if valid
if(!NBIT_BX) ibx = 0;
return dataarray_[ibx][DEPTH_ADDR*page+index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomalin Same comment regarding indentation.

Suggested change
//assert(page < NPAGE);
// TODO: check if valid
if(!NBIT_BX) ibx = 0;
return dataarray_[ibx][DEPTH_ADDR*page+index];
//assert(page < NPAGE);
// TODO: check if valid
if(!NBIT_BX) ibx = 0;
return dataarray_[ibx][DEPTH_ADDR*page+index];

Comment on lines 148 to 151
nentries_[ibx*NPAGE+page] = 0;
for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {
dataarray_[ibx][DEPTH_ADDR*page+addr] = data;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomalin Same comment regarding indentation.

Suggested change
nentries_[ibx*NPAGE+page] = 0;
for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {
dataarray_[ibx][DEPTH_ADDR*page+addr] = data;
}
nentries_[ibx*NPAGE+page] = 0;
for (size_t addr=0; addr<DEPTH_ADDR; ++addr) {
dataarray_[ibx][DEPTH_ADDR*page+addr] = data;
}

@@ -209,7 +210,7 @@ class MemoryTemplateTPROJ

void print_entry(BunchXingT bx, NEntryT index, unsigned int page = 0) const
{
print_data(dataarray_[bx][(1<<NBIT_ADDR)*page+index]);
print_data(dataarray_[bx][DEPTH_ADDR*page+index]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomalin Same comment regarding indentation.

Suggested change
print_data(dataarray_[bx][DEPTH_ADDR*page+index]);
print_data(dataarray_[bx][DEPTH_ADDR*page+index]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced all tabs by spaces in MemoryTemplate.h MemoryTemplateBinnedCM.h & MemoryTemplateTPROJ.h , and then adjusted the number of spaces, such that it increases by 2, whenever, there is an additional pair of curly brackets.

Copy link
Contributor

@aehart aehart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me. I'll go ahead and merge now, and we should discuss the remaining open questions, perhaps in the next TF FW meeting.

@aehart aehart merged commit 5f3bd55 into master Dec 4, 2024
1 check passed
@aehart aehart deleted the ianFixFutureSW branch December 4, 2024 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants